Entry Name:   IIITH-YASHASWI-MC2

VAST Challenge 2014
Mini-Challenge 2

 

 

Team Members:

Yashaswi Pochampally, International Institute of Information Technology Hyderabad, p.yashaswi@students.iiit.ac.in (PRIMARY)
Navya Yarrabelly,International Institute of Information Technology Hyderabad, yarrabelly.navya@students.iiit.ac.in
Veera Raghavendra Chikka, International Institute of Information Technology Hyderabad, raghavendra.ch@research.iiit.ac.in
Kamalakar Karlapalem(Advisor), International Institute of Information Technology Hyderabad kamal@iiit.ac.in



Student Team: YES

 

Analytic Tools Used:

1)We used a slight variation of Geotools toolkit for visualizing geospacial data(gps.csv) thereby visualizing the paths in which the cars are moving .
Source : http://www.geotools.org/
2)We also used QGIS tool to locate and label the necessary placemarks that were mentioned in loyalty/credit card transactions .
Source : https://www.qgis.org/en/site/forusers/download.html
3)We used D3.js for building the data visualization frameworks that helped to depict the unusual patterns in data .
Source : http://d3js.org/

 

Approximately how many hours were spent working on this submission in total?

150 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES

 

Video:

https://www.youtube.com/watch?v=0KMLvbnLQmI

 

IIIT H MC2VIDEO

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1    Describe common daily routines for GAStech employees . What does a day in the life of a typical GAStech employee look like?Please limit your response to no more than five images and 300 words .

 

We have developed an interface which uses geotools to show the paths in which cars are moving at a particular time or date as shown in Fig1.1 (Interface can be seen by clicking on the image) .

Color   

Frequency

Very High frequency

High Frequency

low frequency

very low frequency

Figure 1.1 :Interface showing car geospacial patterns [Click on the image to view the interface]

Colors over shape file(yellow colored lines over the tourist map of Abila) in the genrated images depict the frequency of cars at that paticular geospacial location . Table below shows the patterns after analysing the paths in which cars are moving in various hours for all days .

Time         

Frequency

00 hrs to 06 hrs

very less frequency throughout

07 hrs to 09 hrs

high frequency near GAS Tech

09 hrs to 11 hrs

less frequency throughout

11 hrs to 12 hrs

high frequency near GAS Tech

12 hrs to 14 hrs

high frequency near GASTech and also in the paths joining GASTech and important placemarks as shown in Fig1.2

14 hrs to 15 hrs

high frequency near GAS Tech

15 hrs to 24 hrs

less frequency throughout

Figure 1.2 :Red Colored patterns showing high frequency paths

So , by this we can conclude that the working hours of the employees in GeoTech are 7am to 3pm .
Similarly , if we analyse the data considering various days of a week except for weekends (Sat & Sun) ,we see that most of the employees follow a particular pattern in their paths as shown in Fig 1.3.So,we understand that employees frequently visit some places from the workplace . They include Barwyn Street ,Jacks Magic Beans ,Abila Airport ,Guys Gyros , etc

Figure 1.3 :Image showing patterns showing for weekdays

Fig 1.4 shows the graph for number of transactions across various days of a week . By analysing the credit card/loyalty card transactions ,we can understand that the number of transactions are relatively less(<90) during the weekends(12th,13th,14th and 15th) . For weekdays , the average number of trasactions per day is around 130 which means that most of the trasactions are related to the company . >

Figure 1.4 :Number of transactions for various days

Fig 1.5 shows an interface (similar to that in Fig1.1) which helps in analysing data specific to a particular employee type . Here we have included the locations not only from the gps data but also from loyalty/credit card data and from the figure we can say employee specific frequent locations are as follows

Employee Type         

Locations

Engineering

Hippookampos ,Been there Done that

Executive

Hippookampos , Jack's Magical Beans,Brewed Awakenings

Information Technology

Hallowed Grounds,Ouzeri Elian

Facilities

Abila Airport,Carlyle Chemical Inc,Nationwide Refinery

Security

Many places (Guys Gyros relatively more times)

Figure 1.5 :Interface showing Employee type specific car geospacial patterns [Click on the image to view the interface]

 

MC2.2    Identify up to twelve unusual events or patterns that you see in the data . If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members . For each pattern or event you identify, describe

1.       What is the pattern or event you observe?

2.       Who is involved?

3.       What locations are involved?

4.       When does the pattern or event take place?

5.       Why is this pattern or event significant?

6.       What is your level of confidence about this pattern or event?Why?

 

Please limit your answer to no more than twelve images and 1500 words.

 

We tried to merge gps locations and the locations in credit/loyalty card transactions and when we did so we found a few instances where the same person was at 2 different locations at the same time (i.e. location in loyalty/credit card transaction is different from that in gps) . Such a thing can only happen when credit card or/and car are used by some other person and not the employee and so such instances can be considered as unusual patterns . We show some such patterns below . For visualizing this data we used qgis tool .

First , lets consider instances which were supposed to be at Kronos Mart(according to credit card data) but were not .Fig 2.1 shows the difference in locations .

Unusual Pattern 1

Figure 2.1:Unusual pattern 1

Now lets consider instances which were supposed to be at Been There Done That (according to credit card data) . Fig 2.2 shows the same .

Unusual Pattern 2

Unusual Pattern 3

Figure 2.2:Unusual patterns 2,3

Lastly we have instances which were supposed to be at Jacks Magical Beans . Fig 2.3 shows the same .

Unusual Pattern 4

Unusual Pattern 5

Figure 2.3:Unusual patterns 4,5

By considering only credit amount in transactions we see that employees who belonged to Facilities had total transaction amount near to 20,000 . Exculding them , rest of the employees had amounts less than 5000 with Lucas Alcazar as an expection whose value is much more (10,584) as shown in Fig 2.4 .

Fig 2.4 : Graph showing total trasaction amounts per person

If we furthur look into all trasactions of Lucus Alcazar(Fig 2.5) , we see that the variation is because of a single instance and so we can consider that as an unusual event .

Fig 2.5 : Graph showing all transactions of Lucus Alcazar (Unusual Event 6)

Unsusual event 6

 

Fig 2.6 : Graph showing standard deviation in amounts accross various locations (Click on image to view specific values )

We found standard deviations of amounts considering transactions of every location separately .The locations having high standard deviations would mean that there are some unusual transactions (abnormally high or low amounts) in that location . Here the places having high standard deviations are Maximum Iron and Steel, Nationwide Refinery Kronos Pipe and Irrigation, Abila Airport ,Stewart and Sons Fabrication, Carlyle Chemical Inc. ,Abila Scrapyard and Frydos Autosupply n' More
By analysing transactions specific to these locations we found 3 unusual events .

Unsusual event 7

 

Fig 2.7:Graph showing transactions in Abila Airport(Unusual Event 7)

Unsusual event 8

Fig 2.8 :Graph showing transactions in Carle Chemical(Unusual Event 8)

Unsusual event 9

Fig 2.9:Graph showing transactions in Stewart and Sons Fabrication(Unusual Event 9)

We have noticed some events where a person of specific employee type has visited some places which none or very few employees of that type or any other type have visited.

Fig 2.10 : Graph showing number of times an employee has gone to some specific place

Fig 2.11 :Graph specifying on less frequntly visited locations wrt that in Fig 2.10 (Unusual Events 10,11,12)

Unsusual event 10

 

Unsusual event 11

 

Unsusual event 12

 

 

 

MC2.3    Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.Please limit your response to no more than five images and 300 words.

 

Heading1

Missing Data :
Most of the employees whose transaction amounts were analysed were assigned to either of the employee types (in car assignments file), but a few wern't assigned .Fig 3.1 shows the classification of employees . So daily routines specific to an employee type were concluded assuming just the classified employees.

Figure 3.1 :Classification of employees to employee types

Same was the case with carid's in gps data.Most were assigned to some employee but a few wern't .Fig 3.2 shows that. Here also daily routines specific to employee type were found considering only the classified ones.

Figure 3.2 :Classification of carids

Conflicting Data :
When we tried to merge loyalty card data and credit card data for preprocessing , we found that the amount in loyalty card and credit card differed by 1 or 2 digits in some instances . Fig 3.3 shows the difference in amounts between loyalty and credit card trasactions.

Figure 3.3 :Difference in amounts between loyalty and credit card transactions

We resolved that by assuming that the amount in credit card is reliable,and thus changed values in loyalty card transactions accordingly.